Recognizing Text Genres With Simple Metrics Using Discriminant Analysis
نویسندگان
چکیده
A simple method for categorizing texts into pre-determined text genre categories using the statistical standard technique of discriminant analysis is demonstrated with application to the Brown corpus. Discriminant analysis makes it possible use a large number of parameters that may be specific for a certain corpus or information stream, and combine them into a small number of functions, with the parameters weighted on basis of how useful they are for discriminating text genres. An application to information retrieval is discussed.
منابع مشابه
Automatic Metrics for Genre-specific Text Quality
To date, researchers have proposed different ways to compute the readability and coherence of a text using a variety of lexical, syntax, entity and discourse properties. But these metrics have not been defined with special relevance to any particular genre but rather proposed as general indicators of writing quality. In this thesis, we propose and evaluate novel text quality metrics that utiliz...
متن کاملDesign, Compilation, and Preliminary Analyses of Balanced Corpus of Contemporary Written Japanese
Compilation of a 100 million words balanced corpus called the Balanced Corpus of Contemporary Written Japanese (or BCCWJ) is underway at the National Institute for Japanese Language and Linguistics. The corpus covers a wide range of text genres including books, magazines, newspapers, governmental white papers, textbooks, minutes of the National Diet, internet text (bulletin board and blogs) and...
متن کاملMarrying Relevance and Genre Rankings: an Exploratory Study
In this chapter, we discuss different options for using genre-related information in Web search. We conduct an experiment on merging genre-related and text-relevance rankings using a reference Web collection. A method for automatic extraction of formality score akin to readability score using canonical discriminant analysis applied to a sample of genres with decreasing formality is proposed. Ef...
متن کاملOn Recognizing Argumentation Schemes in Formal Text Genres By: Nancy L. Green Green, N. L. On Recognizing Argumentation Schemes in Formal Text
Argumentation mining research should address the challenge of recognition of argumentation schemes in formal text genres such as scientific articles. This paper argues that identification of argumentation schemes differs from identification of other aspects of discourse such as argumentative zones and coherence relations. Argumentation schemes can be defined at a level of abstraction applicable...
متن کاملModeling Communicative Purpose with Functional Style: Corpus and Features for German Genre and Register Analysis
While there is wide acknowledgement in NLP of the utility of document characterization by genre, it is quite difficult to determine a definitive set of features or even a comprehensive list of genres. This paper addresses both issues. First, with prototype semantics, we develop a hierarchical taxonomy of discourse functions. We implement the taxonomy by developing a new text genre corpus of con...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1994